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ABSTRACT 



This paper reviews the basic processes of exploratory factor 
analysis (EFA) with regard to evaluating test score validity. Construct 
validity is the focus of the paper. The similarity between the processes of 
construct validation and EFA is described and the use of EFA as a tool to 
explore score validity is explored. Factor analysis is a method for 
determining the number and nature of the variables that underlie large 
numbers of variables or measures. It tells the researcher what tests or 
measures belong together. Construct validity is studied when the test user 
wants to draw an inference from the test score to performances that can be 
grouped under the label of a particular psychological construct. Factor 
analysis, long associated with construct validity, is a useful tool to 
evaluate score validity. It is emphasized that validity is not a property of 
tests, but rather a property of test scores. The identification of the number 
of factors that underlie a set of variables and the determination of whether 
factors are correlated or uncorrelated can be helpful in evaluating test 
score validity. (Contains 24 references.) (SLD) 
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Abstract 

The present paper reviews the basic processes of exploratory factor analysis (EFA) as 
regards evaluating test score validity. It is emphasized in this paper as well as 
elsewhere that scores (not tests) vary in degrees of validity (Thompson, 1994). 
Construct validity is the focus of the paper, as there has been an increasing evolution 
toward a more unified view of validity (Shepard, 1993). The similarity between the 
processes of construct validation and exploratory factor analysis are described and the 
utility of EFA as a tool to evaluate score validity are explored. 
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Basic Concepts in Exploratory Factor Analysis (EFA) as A Tool to 
Evaluate Score Validity: A Right Brained Approach 

Psychology evolved from the disciples of philosophy and physiology. As Weiten 
(1992) noted, “Philosophy provided the attitude, and physiology contributed the 
[scientific] method” (p. 3). Inherent in the scientific method, which is based on 
observation and measurement, are numerous statistical procedures. 

One of the discoveries of psychology that researchers continue to explore and 
analyze with these statistical procedures addresses the differences between the right 
and left brain hemispheres. A popularly held suggestion is that the hemispheres exhibit 
different modes of thinking. It is suggested that the left hemisphere predominantly 
thinks in verbal terms, is analytic, rational, logical and linearly oriented. The right 
hemisphere’s modes of thinking include nonverbal, nonrational, intuitive and holistic 
approaches (Weiten, 1992). 

The connection between the statistical procedures utilized by psychologists in 
their research and right vs. left brain functioning, is that, while the content can be 
processed in either hemisphere, some professsors teach the material in a technical 
manner amenable to those persons informally known as "left-brainers". Other teachers 
present statistical concepts in an intuitive way. This approach appeals to the "right- 
brainers" in the world (or in this case, the statistics courses)! As readers of this article 
may have surmised, the author of this paper falls into the latter category. For this 
reason, the present statistically-oriented paper is written from a right hemisphere 
perspective, i.e., conceptually and intuitively. 

This paper came about as a result of a literature review on the topic of 
substance abuse, this author's primary area of interest. In the process, it became clear 
that a new instrument for determining alcohol dependence may be warranted. The 
decision was made to pursue the undertaking of creating such an instrument. One of 
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the specific questions to be answered as part of the undertaking was, "Will this 
instrument measure what it is intended to"? It was suggested that an understanding of 
the concepts of construct validity and factor analysis would aid in answering the stated 
question. Furthermore, it was hinted that conducting a factor analysis would specifically 
address construct validation, and in the process, prove a useful tool in evaluating score 
validity. 

This paper, therefore, will address both construct validity and factor analysis. 
Specifically, both concepts will be defined and their processes highlighted. In the 
discussion concerning validity, it will be emphasized that it is the inferences from 
scores of tests that are (or are not) valid, not the tests themselves. Finally, it will be 
shown that exploratory factor analysis is a tool for use in the evaluation of score 
validity, particularly in reference to construct validity. 

Factor Analysis and Construct Validity 

Factor Analysis 

A statistics professor of this author has frequently noted that a great many 
issues in statistical analyses are designed to confuse graduate students. This holds 
true regarding the definitions of many concepts. Factor analysis is an example of a 
topic that has been defined in a variety of ways. Reyment and Joreskog (1993) stated: 
Factor analysis is a generic term that we use to describe a number of methods 
designed to analyze interrelationships within a set of variables or objects 
[resulting in] the construction of a few hypothetical variables (or objects), 
called factors, that are supposed to contain the essential information in a larger 
set of observed variables or objects .... that reduces the overall complexity of the 
data by taking advantage of inherent interdependencies [and so] a small number 
of factors will usually account for approximately the same amount of information 
as do the much larger set of original observations, (p. 71) 
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Cureton and D'Agostino (1983) described factor analysis as "a collection of 
procedures for analyzing the relations among a set of random variables observed or 
counted or measured for each individual of a group" (p. 1 ). The purpose, they said, "is 
to account for the intercorrelations among q variables, by postulating a set of common 
factors , considerably fewer in number than the number, o. of these variables" (p. 2). 
Bryman and Cramer (1990) broadly defined factor analysis as "a number of related 
statistical techniques which help us to determine them [the characteristics which go 
together]" (p. 253). 

Gorsuch (1983) reminded the reader that "all scientists are united in a common 
goal: they seek to summarize data so that the empirical relationships can be grasped 
by the human mind" (p. 2). The purpose of factor analysis, he said, "is to summarize the 
interrelationships among the variables in a concise but accurate manner as an aid in 
conceptualization" (p. 2). 

These definitions most likely make a great deal of sense to those "left-brained" 
individuals who understand complex things fairly easily. Kerlinger (1979) gave both a 
left-brained and a right-brained definition of factor analysis. For the left-brainers: 

"Factor analysis is an analytic method for determining the number and nature of the 
variables that underlie larger numbers of variables or measures" (p. 180). And for the 
right-brainers he noted: "It [factor analysis] tells the researcher, in effect, what tests or 
measures belong together-which ones virtually measure the same thing, in other 
words, and how much they do so" (p. 180). He further commented on factor analysis in 
terms of curiosity and parsimony. He noted, "Scientists are curious. They want to know 
what's there and why. They want to know what is behind things. And they want to do 
this in as parsimonious a fashion as possible. They do not want an elaborate 
explanation when it is not needed." (p. 179). He sounds like a very right-brained 
individual! 
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Each definition of factor analysis has common elements. Each refers in some 
way to the correlations among variables as reflected by the use of the words 
interrelationships, intercorrelations and relations. Further, each definition makes clear 
the notion of reducing the number of variables into a smaller set of factors. In short, 
factor analysis helps to explain things by reducing large amounts of information into a 
manageable form and size. Now that is an explanation that right-brained individuals 
(and of course, lefties, too), can comprehend! 

Validity 

Having been offered a basic understanding of what factor analysis is, the next 
question addresses validity. Specifically acknowledged is construct validity and what 
the notion of validity has to do with factor analysis. 

Cronbach (1971 ) discussed validation as a process used by a test developer or 
test user to collect evidence that supports the types of inferences to be from test 
scores. Different aspects of validity have been defined. Crocker and Algina (1986) 
discussed three types of validation studies conducted to gather evidence of the 
usefulness of scores in addressing a specified inference. Content validity studies are 
used to assess whether the items on an inventory or test adequately represent the 
construct of specific interest. In other words: Can the researcher draw an inference 
from an exmainee's test score to a larger domain of items like those that are on the test 
itself? Criterion-related validity, encompassing both predictive validity and concurrent 
validity, is studied in situations where a test user wants to draw an inference about a 
person's test score to performance on a real behavioral variable that has practical 
importance. Construct validity is studied when "the test user desires to draw an 
inference from the test score to performances that can be grouped under the label of a 
particular psychological construct" (Crocker & Algina, 1986, p. 218). 
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Four types of validity were identified by the American Psychological Association 
(APA) when validity standards were first codified in 1954. These four types 
corresponded to different aims of testing: (a) content validity (b) predictive validity, (c) 
concurrent validity, and (d) construct validity (American Psychological Association, 
1954). In 1966 the APA reduced predictive validity and concurrent validity to a single 
category: criterion-related validity (American Psychological Association, 1966). 

Construct validity = factorial validity? = the only validity? 

It has been suggested that construct validity encompasses both criterion and 
content validity. Sheperd (1993) noted that construct validity envelopes the empirical 
and the logical requirements of criterion and content validity. Anastasi (1986) agreed 
that construct validity subsumes both content validity and criterion-related validity 
requirements. 

Nunnally (1978) reported that "construct validity has [even] been spoken of as ... 
'factorial validity 1 " (p. 1 1 1 ). As much as 50 years ago, this concept was acknowledged 
by Guilford (1946): "The factorial validity of a test is given by its loadings in meaningful, 
common, reference factors. This is the kind of validity that is really meant when the 
question is asked: Does this test measure what it is supposed to measure?" (p. 428, 
emphasis added). 

Again, 44 years later, Bryman and Cramer (1990) noted that "factor analysis 
enables us to assess the factorial validity of the questions which make up our scales by 
telling us the extent to which they seem to be measuring the same concepts or 
variables" (p. 253). 

Construct validity 

A noteworthy emphasis of the present paper focuses on construct validity, and 
so that concept will be further addressed. Construct validity, although defined in the 
previous, section, was explained in a manner somewhat more amenable to the right- 
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brained population by Heppner, Kivlighan and Wampold (1992). This definition of 
construct validity focused on how well the variables chosen by the researcher to 
represent a hypothetical construct really "capture the essence" (p. 46) of that 
hypothetical construct. Stated differently, construct validity is " the degree to which the 
measured variables used in the study represent the hypothesized constructs" (p. 47). 
Stated even more simply, construct validity answers the question: Does this test or 
instrument really measure what it is intended to measure? 

It should be apparent at this point, to all persons, regardless of brain hemisphere 
dominance, that factor analysis has long been associated with construct validity. It 
follows then, that factor analysis is a useful tool with which to evaluate score validity. 

Validity of test scores 

Frequently, both lay persons and professionals can be heard commenting about 
tests being reliable or valid. Even the right-brained population can clearly see the error 
in such a statement. A test is just a test. Validity and reliability are functions of the test 
scores, as determined by the test takers . In addition, as pointed out by Sheperd (1992), 
'Validity must be established for each particular use of a test" (p. 406). Cronbach 
(1971) agreed: "One validates, not a test, but an interpretation of data arising from a 
specified procedure" (p. 447). 

Crocker and Algina (1986) described a process used to provide the construct 
validity of an instrument. In addition, they described four procedures (one being factor 
analysis) frequently utilized in construct validation. Regardless of the specific technique 
used, the steps generally followed include (a) formulating a hypothesis about how 
those who differ on the proposed construct do in fact differ in relation to other 
constructs already validated, (b) selecting or developing a measurement instrument 
that consists of items specifically representing the construct, (c) gathering empirical 
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data so the hypothesized relationships can be tested, and (d) determining if the data 
are consistent with the hypothesis. 

Heppner, Kivlighan, and Wampold (1992) delineated steps to be taken in factor 
analysis that curiously resemble those stated above as they apply to construct 
validation: (a) the researcher must first carefully think about the specific research 
question he or she wishes to address, (b) he or she chooses to use or develop an 
instrument constituting the variables specified, (c) the researcher selects the sample, 
collects the data, and begins to factor analyze the data in order to identify the common 
dimensions of a set of variables and to see which items go together to make up a 
factor, and (d) the researcher determines if the factors are correlated. See? It's starting 
to come together. We're finding out: Are the test items measuring what they're 
supposed to be measuring? Construct validity and factor analysis constitute a natural 
pairing. 

It becomes evident (even to those with the heaviest of right brains) that factor 
analysis applies to construct validity. It should be clear at this point that one of the 
purposes of factor analysis is to determine the factors that underlie a given set of 
variables. In addition, the reader has hopefully been able to establish the connection 
between factor analysis and its usefulness as a tool in evaluating score validity. In 
other words, conducting a factor analysis of the observed scores on a given instrument, 
one can determine if indeed, the test is measuring the variables it purports to. This, in 
essence, is the definition of construct validation. 

Factor analysis-Exploratory Versus Confirmatory 

As has been shown regarding many other topics in statistics, various definitions 
exist for both exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). 
A sound left-brained definition is provided by Stevens (1996) for each of these two 
types of factor analysis: 
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The purpose of exploratory factor analysis is to identify the factor 
structure or model for a set of variables. This often involves determining 
how many factors exist, as well as the pattern of the factor loadings ... 

EFA is generally considered to be more of a theory-generating than 
a theory-testing procedure. In contrast, confirmatory factor analysis 
(CFA) is generally based on a strong theoretical and/or empirical 
foundation that allows the researcher to specify an exact factor model in 
advance. This model usually specifies which variables will load on 
which factors, as well as such things as which factors are correlated. 

It is more of a theory-testing procedure than is EFA. (p. 389) 

Fortunately, Stevens (1996) also provided a table, giving a visual representation 
of the above definition (a definite plus for the right-brainers): 



EXPLORATORY 
THEORY GENERATING 
Heuristic - weak literature base 
Determine the number of factors 
Determine whether the factors are 
correlated or uncorrelated 
Variables free to load on all factors 



CONFIRMATORY 
THEORY TESTING 
Strong theory and/or strong empirical base 
Number of factors fixed a priori 
Factors fixed a priori as correlated or 
uncorrelated 

Variables fixed to load on a specific factor 
or factors (p. 389) 



In the process of determining whether the identified factors are correlated, EFA 
answers the question asked by construct validity: Do the scores on this test measure 
what the test is supposed to be measuring via addressing whether or not the factors 
are correlated? Attention is next turned to the issue of the specific process of 
exploratory factor analysis. 

Exploratory factor analysis (EFA) 

Before entering into a discussion as to the process of factor analysis, it is 
necessary to touch upon a fundamental assumption upon which these procedures are 
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based. Without going into a lengthy explanation of the concept of linearity, suffice it to 
say that: 

Factor analysis assumes that the observed (measured) variables are linear 
combinations of some underlying source variables (or factors). That is, 
it assumes the existence of a system of underlying factors and a system of 
observed variables. There is a certain correspondence between these two 
systems and factor analysis "exploits" this correspondence to arrive at 
conclusions about the factors. (Kim, 1986, p. 8) 

As noted previously, exploratory factor analysis can be used as a method of 
determining the minimum number of underlying hypothetical factors that represent 
a larger number of variables. In exploratory factor analysis, this is done by showing the 
intercorrelations among the variables without having prior specifications of what these 
factors might be. 

Factors have not, up to this point, been specifically defined. There are, again, 
many definitions provided, enough to please both left- and right-brained individuals. 
Among them are Cureton and D'Agostino's (1983) definition: 'The factors are random 
variables that cannot be observed or counted or measured directly, but which are 
presumed to exist in the population and hence in the experimental sample .... they are 
sometimes termed latent variables" (p. 3). Tinsley and Tinsley (1987) stated that factors 
are "hypothetical constructs or theories that help interpret the consistency in a data set" 
(p. 414). Kline (1994) defined a factor as a "dimension or construct which is a 
condensed statement of the relationship between a set of variables" (p. 5). Kim and 
Meuller*s (1978) definition stated that factors are "hypothesized, unmeasured, and 
underlying variables which are presumed to be the sources of the observed variables 
... which are smaller in number than the number of observed variables, [and] are 
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responsible for the covariation among the observed variables" (pp. 77, 12). Cureton 
and D'Agostino (1983) clarified the hypothetical nature of the factors: 

The factors are actually hypothetical or explanatory constructs. Their reality in 
the individuals of the population or sample is always open to argument. At the 
conclusion of a factor analysis we can only say of the factors that if they were 
real, then they would account for the correlations found in the sample, (p. 3) 

In essence, then, factors are the latent (unobserved), hypothetical, underlying concepts 
(or constructs) deduced from the correlations between the measured variables of the 
instrument or test. (Notice the term construct used in the definition of factor; no wonder 
the association between exploratory factor analysis and construct validity). 

The Process of Factor Analysis 
Data matrix 

The first step in an exploratory factor analysis is to display the data in a data 
matrix . A data matrix is "any array of numbers with one or more rows and one or more 
columns" (Reymont & Joreskog, 1993, p. 15). This appears to be quite straightforward 
(much to the surprise and relief of the right-brained). Ah, but not so fast. In an effort to 
complicate matters, there are issues of a vector (a matrix that has only one row) and a 
scalar (which has both one row and one column), as well as a variety of matrices 
identified by Gorsuch (1983) in developing factor analytic concepts. (The right-brained 
among you are possibly noticing a constriction of air passages at the number of 
possible options, but not to worry. This is merely an introductory paper on the topic of 
factor analysis.) 

Correlation matrices. 

In order to determine the factors underlying the variables, a "variable reduction 
scheme" (Gorsuch, 1983, p. 362) is used which shows how the variables cluster 
together; i.e., the variables are correlated with one another. These correlations are 
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represented in a matrix of association . A statistical measure of association such as the 
Pearson r is used to indicate the magnitude of the correlations. 

A correlation (or variance-covariance) matrix represents the relationships 
among the set of variables in the study. In this correlation (or variance-covariance) 
matrix of variables, the values located on the diagonal will be 1 .0. This is because each 
of the variables will correlate perfectly with itself. The off-diagonal elements are the co- 
variances between all variable pairs. (Remember, right-brainers, this simply means the 
correlations between the variables.) 

Because the number of correlations in the matrix reflects the number of variables 
used in a study, it is possible that a single correlation matrix may have thousands of 
entries. Factor analysis, explained Hetzel (1995), "attempts to simplify the correlation 
matrix by accounting for a large number of relationships with a smaller number of 
explanatory constructs [i.e., factors]" (p. 7). He further stated that these hypothetical 
factors are determined by examining additional data matrices, specifically the factor 
pattern matrix and the factor structure matrix . 

In much of the literature on factor analysis, the term "factor loading" is used 
instead of the more accurate terms, factor pattern coefficients and factor structure 
coefficients , which are the elements comprising the factor pattern and factor structure 
matrices. The exact nature of these coefficients and corresponding matrices is beyond 
the scope of this paper. The important element is that factor pattern coefficients 
represent the relationship of a specific variable to a specific factor without the influence 
of other variables (Stevens, 1992). The factor structure coefficients can be thought of 
as being identical to structure coefficients in other types of correlational analyses. 
These coefficients show the correlations of the variables with the factors (Hetzel, 

1 995). It is with the results of these additional matrices, and through the careful 
interpretation of the data, that the factors are extracted and interpretations made. 
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Extracting the factors 

We are reminded by Cattell (1978) that "factor analysis is, in principle, nothing 
more than asking what the common elements are when one knows the correlation" (p. 
20). It is at this point, when we have calculated the correlations between the variables 
and factors, that we can begin to determine the number of factors underlying the 
variables. The chief concern, at this stage, according to Kim and Meuller (1978) is 
whether a smaller number of factors can account for the covariation among the original, 
larger set of variables. Gorsuch (1 983) indicated that there are numerous methods that 
can be used in deciding how many factors to retain. Again, these methods are too 
detailed for the current paper, but in general, regardless of the method used, he 
suggested that "one would want to account for at least 70% of the total variance" (p. 
367). 

The critical point in deciding how many factors to retain is that this decision 
requires the researcher to carefully consider the data and to use his or her judgment. 

As with many other statistical concepts, a number of decision rules are available to 
help guide the researcher with the decision as to the specific number of factors to 
retain. This topic was summarized by Hetzel (1 995): 

Regardless of the rules eventually used, when considering the number of 
factors to retain, it is important for the researcher to remember the 
advantages and limitations of the various decision rules and to make a 
subsequent decision in a thoughtful and well-reasoned manner, based 
on the nature of the analysis, (p. 1 7) 

Interpretation of the factors 

Following the initial extraction of factors, an interpretation of these factors is 
necessary. Kim and Meuller (1978) pointed out: 
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It is important to emphasize ... that factor analysis does not tell the researcher 
what substantive labels or meaning to attach to the factors. This decision 
must be made by the researcher. Factor analysis is purely a statistical 
technique indicating, which, and to what degree, variables relate to 
an underlying and undefined factor. The substantive meaning given to 
a factor is typically based on the researcher's careful examination of what the 
high loading variables measure. Put another way, the researcher must ask what 
these variables have in common, (p. 56) 

It should be noted that the factors must be called something other than the name 
of a particular observed variable. The reason for this is that factors are latent 
aggregates of observed variables and the factor name should represent the aggregate 
and not be confused with a specific measured variable. 

At this point in the analysis, the minimum number of factors that can account for 
the observed correlations have been identified and named. To obtain a more easily 
interpretable solution regarding the factors, the researcher can engage in a process 
known as rotation . This is most easily done by computer and again, is too complicated 
a matter for this paper. The results of rotation, however, indicate “the simplest solution 
among a potentially infinite number of solutions that are equally compatible with the 
observed correlations” (Kim & Mueller, 1978, p. 59). 

The process of exploratory factor analysis results in the smallest, and most 
compatible number of underlying factors from a larger set of initial variables on a test or 
instrument. The process can be summarized as follows: (a) the researcher collects 
observed scores (raw data) on an instrument without having a preconceived notion as 
to the number of underlying factors, (b) presents this information in data matrices, (c) 
correlates the variables, and (d) identifies the factors underlying the variables. 
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Summary 

Tests are not valid in and of themselves. Rather, test scores may be valid. 
Although many types of validity have been identified, construct validity has been 
suggested as encompassing all forms of validity. In addition, construct validity 
addresses the issue of whether a test does, in fact, measure what it purports to. 

Exploratory factor analysis serves a number of functions including identification 
of the number of factors that underlie a set of variables and determination as to 
whether the factors are correlated or uncorrelated. This process can be an aid in 
evaluating the score validity of a test via these two functions. 
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